Config

Data ingestion

Data exploration

features to explore:

features to explore:

target is weighted_average_vote or mean_vote

allows to link movies to names feature to explore:

Data formatting

Feature engineering

Target computation

Historical ratings for film crew

Movies

Data join

Filtering

historical rating features are not available for half of the movies it is due to missing records in the title_principals dataset the assumption here is that the data is avaialble but not updated yet, therefore I chose to keep the features and drop the missings, assuming the mssing title_principals data will become available

Train validation split

Resampling

as the dataset is imbalanced, we will resample it using a random over sample to 1:10

Modeling

Pipeline

Feature selection

Model tuning

Model evaluation

Model insights